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Abstract — We address the problem of delay in an arithmetic 
coding system. Due to the nature of the arithmetic coding process, 
source sequences causing arbitrarily large encoding or decoding 
delays exist. This phenomena raises the question of just how large 
is the expected input to output delay in these systems, i.e., once a 
source sequence has been encoded, what is the expected number 
of source letters that should be further encoded to allow full 
decoding of that sequence. In this paper, we derive several new 
upper bounds on the expected delay for a memoryless source, 
which improve upon a known bound due to Gallager. The bounds 
provided are uniform in the sense of being independent of the 
sequence's history. In addition, we give a sufficient condition for 
a source to admit a bounded expected delay, which holds for a 
stationary ergodic Markov source of any order. 

I. Introduction 

Arithmetic coding has been introduced by Elias [1], as 
simple means to sequentially encode a source at its entropy 
rate, while significantly reducing the extensive memory usage 
characterizing non-sequential schemes. The basic idea under- 
lying this technique is the successive mapping of growing 
source sequences into shrinking intervals of size equal to the 
probability of the corresponding sequence, and then repre- 
senting those intervals by a binary expansion. Other coding 
schemes reminiscent of Elias' arithmetic coding have been 
suggested since, aimed mostly to overcome the precision 
problem of the original scheme [2] [3]. 

Delay in the classical setting of arithmetic coding stems 
from the discrepancy between source intervals and binary 
intervals, which may prohibit the encoder from producing 
bits (encoding delay) or the decoder from reproducing source 
letters (decoding delay). On top of its usual downside, delay 
also increases memory usage, and therefore a large delay may 
turn the main advantage of arithmetic coding on its head. As 
it turns out, for most sources there exists infinite number of 
source sequences for which the delay is infinite, where each 
sequence usually occurs with probability zero. A well known 
example demonstrating this phenomena is that of a uniform 
source over a ternary alphabet {0, 1, 2}. The source sequence 
111 ... is mapped into shrinking intervals that always contain 
the point A, and so not even a single bit can be encoded. 
This observation leads to the question of just how large is 
the expected delay (and consequently, the expected memory 
usage) of the arithmetic coding process for a given source, and 
if it is bounded at all. 

The problem of delay can be practically dealt with by 
insertion of a fictitious source letter into the stream to "re- 
lease" bits from the encoder or letters from the decoder, 



whenever the delay exceed some predetermined threshold. 
Another possibility is coding of finite length sequences, so that 
a prefix condition is satisfied at the expense of a slightly higher 
redundancy, and blocks can be concatenated [4]. Nevertheless, 
it is still interesting to analyze the classical sequential setting 
in terms of expected delay. 

In his lecture notes [5], Gallager has provided an upper 
bound for the expected delay in arithmetic coding for a 
memoryless source, which was later generalized to coding over 
cost channels [6]. Gallager's bound is given by 



E(£>) < 



log(8e 2 //3) a 
log(l/a) 



T> B (a,p) 



where a and (3 are the maximal and minimal source letter 
probabilities respectively. Notice that this bound is indepen- 
dent of the sequence's history, as shall be the case with all the 
bounds presented in this paper. In Theorem fTI (section IIIII we 
derive a new upper bound for the expected delay, given by 



E(D) < 1 



4a(l - a + log(l/a)) a 

(1-a) 2 ~ 



Pi (a) 



which depends only on the most favorable source letter. 
Following that, we show that the dependence on the least fa- 
vorable letter in Gallager's bound is unnecessary, and provide 
(section HVi a uniformly tighter version of the bound given 
by T> mg (a) = T> g (a, a). In Theroem l2l ( section IVl we derive 
another bound T>i{a) uniformly tighter than T>\{pc), which is 
also shown to be tighter than V mg (a) for most sources, and 
looser only by a small multiplicative factor otherwise. 

Our technique is extended to sources with memory, and in 
Theorem l3l ( section I Vll > we provide a new sufficient condition 
for a source to have a bounded expected delay under arithmetic 
coding. Specifically, this condition is shown to hold for any 
stationary ergodic Markov source over a finite alphabet. 

II. Arithmetic Coding in a Nutshell 

Consider a discrete source over a finite alphabet 
X = {0, 1, . . . , K — 1} with positive letter probabilities 
{po,Pi, ■ ■ ■ ,Pk-i}- A finite source sequence is denoted by 
x m — { x m,x m +i, ■ ■ ■ ,x n } with x n — a;™, while an infi- 
nite one is denoted by x°°. An arithmetic coder maps the 



sequences x , X 



n+i 



into a sequence of nested source 



intervals l{x n ) D X(x n+1 ) 
converge to a point y(x°°) 



D ... in the unit interval that 
= n{JL]X(a: n ) . The mapping is 



defined as follows: 



ACO 



E 

0=0 



Pj> I(x 1 ) = f 1 (x 1 ) 



f(x n ) = /(^- 1 ) + A(x„)Pr(x«- 1 ) 
l(x n ) = [f(x n )J(x n )+Vr(x n )) 

Notice that \I{x n )\ = Pr(ir n ) and that source intervals 
corresponding to different sequences of the same length are 
disjoint. Following that, a random source sequence X n is 
mapped into a random interval 2(X n ), which as n grows 
converges to a random variable Y(X°°) that is uniformly 
distributed over the unit interval. 

For any sequence of binary digits b k = {61, 62, • • • , 6fc} we 
define a corresponding binary interval 

J{b k ) = [0.6i6 2l . . . 6 fc 0, O.6162, . . . 6&1) (1) 

and the midpoint of J(b k ) is denoted by m{b k ). 

The process of arithmetic coding is performed as follows. 
The encoder maps the input letters x n into a source interval 
according to Q, and outputs the bits representing the smallest 
binary interval J(b k ) containing the source interval l(x n ). 
This process is performed sequentially so the encoder produces 
further bits whenever it can. The decoder maps the received 
bits into a binary interval, and outputs source letters that 
correspond to the minimal source interval that contains that 
binary interval. Again, this process is performed sequentially 
so the decoder produces further source letters whenever it can. 

III. Memoryless Source 

In this section, we provide a new bound for the expected 
delay of an arithmetic coding system for a memoryless source, 
as a function of the probability of the most likely source letter 

A 

a = maxpfc. 

All logarithms in this paper are taken to the base of 2. 

Theorem 1: Assume a sequence of n source letters x n has 
been encoded, and let D be the number of extra letters that 
need to be encoded to allow x n to be fully decoded. Then 



Pr(L> > d) < 4a d (1 + dlog(l/a)) 



(2) 



independent of x n . The expected delay is correspondingly 
bounded by 

4a(l - a + log(l/a)) a 
E(D <1 + — { -— .f ' >} =Pi(a. (3) 

Let us first outline the idea behind the proof. The se- 
quence x 11 has been encoded into the binary sequence b k 
which represents the minimal binary interval J(b k ) satisfying 
X(x n ) C J(b k ). The decoder has so far been able to decode 
only m < n letters, where m is maximal such that J(b k ) C 
2.(x m ). After d more source letters are fed to the encoder, 
x n+d j s encoc jed into b k where k' > k is maximal such that 
2(x n+d ) C J(b k ). Thus, the entire sequence x n is decoded 
if and only if 

X{x n+d ) CJ(b k ' )C2(x n ). (4) 



Now, consider the middle point m(b k ), which is always 
contained inside I{x n ) as otherwise another bit could have 
been encoded. If m(b k ) is contained in T(x n+d ) (but not 
as an edge), then condition cannot be satisfied, and the 
encoder cannot yield even one further bit. This observation 
can be generalized to a set of points which, if contained in 
2(x n+d ), x n cannot be completely decoded. For each of these 
points the encoder outputs a number of bits which may enable 
the decoder to produce source letters, but not enough to fully 
decode x n . The encoding and decoding delays are therefore 
treated here simultaneously, rather than separately as in [6]. 

We now introduce some notations and prove a Lemma, 
required for the proof of Theorem Q Let [a.b) C [0, 1) 
be some interval, and p some point in that interval. In the 
definitions that now follow we sometime omit the dependence 
on a, b for brevity. We say that p is strictly contained in [a, b) 
if p £ [a, 6) but p ^ a. We define the left-adjacent of p w.r.t. 
[a, b) to be 

£(p) = min < x £ [a,p) : 3k G Z , x = p 

and the t-left- adjacent of p w.r.t. [a, b) as 



,-fc 



Notice that t^> (p) — > a monotonically with t. We also define 
the right-adjacent of p w.r.t [a, b) to be 



r(p) = max < x £ (p, b) : 3k £ 



, x = p + '. 



and n*)(p) as the t-right-adjacent of p w.r.t. [a, 6) similarly, 
where now r^ (p) — > b monotonically. For any 5 < b — a, 
the adjacent S-set of p w.r.t. [a, b) is defined as the set of all 
adjacents that are not "too close" to the edges of [a, b): 

S s {p) = {x£ [a + 5,b-6) : 3teZ + U{0}, 

X = l {t) ($>) V x = r {t) (p)X 

Notice that for S > p — a this set may contain only right- 



6-0 



it is 



adjacents, for 5 > b — p only left-adjacents, for 5 > , 
empty, and for 6 = it is infinite. 

Lemma 1: The size of S$(p) is bounded by 

|%(p)|<l + 21ogt^ (5 ) 

Proof: It is easy to see that the number of t-left-adjacents 
of p that are larger than a + 8 is the number of ones in the 
binary expansion of (p — a) up to resolution 5. Similarly, the 
number of t-right-adjacents of p that are smaller than b — S is 
the number of ones in the binary expansion of (b — p) up to 

resolution 8. Defining \x\ + — max(|"a;],0), we get: 



\Ss(p)\ 



< 



< 



[log 



p-° 1+ 



r io g— H 



S ' ' ° 8 
2 + log (£z^z£) ,8<p-a,b-p 

-1 1 1 \b— a\ 

1 + log L - s — '- , o.w. 



< 1 + 2 log 



<5 

16 -ol 



(6) 



as desired. ■ 

Proof of Theorem [7J Assume the source sequence x n has 
been encoded into the binary sequence b , and let Y = 
Y(x°°). Given x n , Y is uniformly distributed over I(x n ), 
and thus for any interval T 

\TfM{x n )\ IT 



Pr(r e r 



< 



(7) 



\l{x n )\ " |Z(x«)| 
The size of the interval I(a; n+d ) for d > is bounded by 
\l(x n+d )\ = Pr{x n+d )=Pr{x r ;+ d 1 \x n )Pr(x n ) 

= Pr{xl l + d \x n )\I(x n )\<a d \I(x n )\ (8) 
Combining and (|8}, we have that for any point p G I{x n ) 



Pr (p g X(X 



n+d\ 



< 



< 



Pil\Y-p 

2a d \l(x n )\ 
\l(x n )\ 



< l^")|a d 



= 2a 



d 



(9) 



where the probabilities are taken w.r.t. to "future" source 
letters. For any interval T C I(x n ) that shares an edge with 
I(x n ) we have that 

v v ;^v^_ |J(x")| |X(a:"-)| 

(10) 

For any d > 0, let Sa denote the adjacent <5-set of m(b k ) w.r.t. 
the interval I{x n ). Given x™, the probability that the delay 
D is larger than d is the probability that @ is not satisfied, 
which in turn is equal to the probability that the intersection 
Sq C]T(X n+d ) is not empty. This fact is explained as follows. 
As already shown, if m{b k ) is strictly contained in I{X n+d ) 
then the encoder emits no further bits, and the delay is larger 
than d. Otherwise, assume I(X n+d ) lies on the left side of 
m{b k ). Obviously, if l(X n+d ) C [(.(m{b k )) 1 m{b k )), then x n 
is fully decoded since (0} is satisfied. However, if £(m(b k )) 
is strictly contained in I(X n+d ) then (0} is not satisfied, x n 
cannot be decoded and the delay is larger than d. The same 
rationale also applies to r(m(b k )). Continuing the argument 
recursively, it is easy to see that x n can be decoded if and 
only if no point of Sq is strictly contained in I(X n+d ). 

Now, notice that S$ C So, and that Sq\Ss is contained in 
two intervals of length 5 both sharing an edge with I(x n ) 
(the situation is illustrated in Figure [Q. Letting Jb denote a 
general binary interval, we bound the delay's tail probability: 

Prp > d | x n ) - Pr (l(X n+d ) 2 Jb , V Jb C l(x n ) \ x n 
= Pr(S nl(X n+d )^<j)\x n ) < 
Pr(S s nl(X n+d )^(t)\x n ) + 
+ Pr((S Q \S s )rM(X n+d )^t\x n ) 

2a d \S s \ + 2 ( a d + ^J— \ 



< 



< 



< 



2a a 



+2 [at 



l + 21og 
5 



|J(^)| 
\l(x n )\ 



P(x n )\ 



Lemma [2 and equations d9l.dlOI were used in the transitions. 
Taking the derivative of the right-hand-side of Jill w.r.t. 5 we 
find that S = 2a d \2(x n )\ minimizes the bound. We get: 



Pr(D > d I 



< 2a 



1 



1 



2a a 
Aa d (l + d\og{l/a)) 



6a d 



and (|2j is proved. Now, the expectancy of D given x n can be 
bounded accordingly 



E(D\x n ) 



< 



< 



oo 

^2dPr{D 


oo 


(D 


> d 


d=l 


d=l 






l + ^Pr(D>d \x n ) 






oo 


! (l + dlog(l/a)) 






4a(l- 

1 1 V 


a + log(l/a)) 







(1 — a) z 

and (|3|l is proved. Notice that both of the bounds above are 
uniform so the dependence on x n can be removed. ■ 

7(x n ) 
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Fig. 1. Source interval illustration 

IV. Improving Gallager's Bound 

Gallager [5] provided an upper bound for the expected delay 
in arithmetic coding of a memoryless source, given by 



E(D)< 



log(8e 2 //3) a 
log(l/o) 



V B {a,0) 



(11) 



where a — maxpfc and (3 = minpfc. Notice that our bound 
T>i(a) in (|3} depends only on the most likely source letter, 
while Gallager's bound T> g (a, f3) depends also on the least 
likely source letter. Moreover, holding a constant we find 
that T> g (a,f3) — >oo. This phenomena is demonstrated in the 
following example. 

Example: Consider a ternary source with letter probabilities 
(p, — i^, — i^). Both bounds for that source are depicted in 
Figure |3] as a function of p, together with a modified bound 
derived in the sequel. As can be seen, Gallager's bound is 
better for most values of p, but becomes worse for small p, 
due to its dependence on the least probable source letter. In 
fact, the bound diverges when p — ► 0, which is counterintuitive 
since we expect the delay in this case will approach that of 
a uniform binary source (for which T> g (a,(3) is finite). In 
contrast, the new bound which depends only on the most likely 
letter tends to a constant when p — > 0, which equals its value 



for the corresponding binary case. 



Intuition suggests that least likely letters are those that 
tend to accelerate the coding/decoding process, and that the 
dominating factor influencing the delay should be the most 
likely source letters. Motivated by that, we turn to examine 
the origin of the term (3 in Gallager's derivations. 

Gallager's bound for the expected delay is derived via 
a corresponding bound on the information delay, i.e., the 
difference in self-information between a source sequence and 
an extended source sequence needed to ensure that the original 
sequence is completely decoded. We remind the reader that the 
self information of a sequence x n is just — log (Pr(x n )). We 
now follow the derivations in [6], replacing notations with our 
own and modifying the proof to remove the dependence on 
(3. Notice that [6] analyzes the more general setting of cost 
channels which reduces to that of [5] and to ours and by setting 
N = 2, C = c.i = c max = 1 (in the notation therein). 

Consider a source sequence encoded by a binary sequence 
b k . A bound on the expected self-information of that sequence 
with the last letter truncated is given by [6, equations 10,11] 

E(l(x n( > k) - l )\b k ) <k + log(2e) (13) 

where n(k) is the number of source letters emitted by the 
source, and /(x™^ -1 ) is the self-information of the corre- 
sponding source sequence without the last letter. Using the 
relation I{x n ) < I(x n ^ 1 ) + log(l//3), we get a bound on the 
self-information of the sequence [6, equation 14]: 

^)\b k ) < k + log(2e//3) 



E(/(a 



(14) 



This is the only origin of the term (3. In order to obtain a bound 
on the expected information delay, there seems to be no escape 
from the dependence on f3. However, we are interested in the 
delay in source letters. We therefore continue to follow [6] but 
use (1131 in lieu of dl4> to bound the information delay up to 
one letter before the last needed for decoding. This approach 
eliminates the dependence on the least likely letter, which if 
appears last may increase the self-information considerably 
but meanwhile contribute only a single time unit to the delay. 
Consider a specific source sequence x n . A bound on the 
expected number of bits k(n) required to decode that sequence 
is given by [6, equation 15]: 

E(fc(n)|x n ) < I{x n ) + log(4e) (15) 

Now, let b k ( n > be the binary sequence required to decode x n . 
Using Jl 3i (instead of dl4> used in [5] and [6]) we have that 



E(l(x n - 



-D-l 



)\b k{ - n >,x n ) < k(n) + log(2e) (16) 



where D is the number of extra letters needed to ensure the 
encoder emits the necessary k(n) bits. Using dl5> and taking 
the expectation w.r.t. k(n) we find that 



E(l(x 



ri+D-1 



)\x n ) -I(x n ) <log(8e 2 ) 



(17) 



and the modified bound for the delay in source letters follows 
through by dividing (I17> by the minimal letter self-information 
log(l/a) and rearranging the terms: 

log(8e 2 ) a 



E(D| 



< 1 



log(l/o) 



V mg(a) 



Notice that the modified Gallager bound V mg (a) = T) g (a, a) 
is uniformly lower T> g (a,/3), and coincides with it only for 
uniformly distributed sources. 

Example (continued): The modified Gallager bound for the 
ternary source converges for p — » 0, as illustrated in Figure 
[2] It is also easy to verify that it converges to the same value 
it takes for a uniform binary source. 



New bound 

$i Gallager's bound 

A Modified Gallager's Bound 




Fig. 2. Bounds for the ternary source 

The ratio of our bound to the modified Gallager bound 
is depicted in Figure [3] together with two tighter bounds 
introduced in the following section. Comparing the bounds, 
we find that Pi (a) is at most ~ 2.4 times worse than T> mg (a), 
and is even better for small (below ~ 0.069) values of a. For 
a — > the ratio tends to unity, since both T>\ (a) and T> mg (a) 
approach 1, the minimal possible delay for a source that is 
not 2-adic. Indeed, for a very small a it is intuitively clear 
that even when a single extra letter is encoded, the source 
interval decreases significantly which enables decoding of the 
preceding source interval with high probability. 

V. Improving Our Bound 

As we have seen, T>i(a) is good for small values of a (the 
probability of the most likely letter) and becomes worse for 
larger values. The source of this behavior lies in a somewhat 
loose analysis of the size of Ss for large 5, and also since for 
large a and small d the bound (|2) may exceed unity. A more 
subtle analysis enables us to improve our bound for large a, 
and the result is now stated without proof. 



Theorem 2: Let do 



and define d\ > do to be 



(18) 



_log(l/a)_ 

the largest such integer for which every integer do < d < d\ 
(if there are any) satisfies 

2a d (l + 2dlog(l/a)) > 1 

The expected delay of an arithmetic coding system for a 



memoryless source is bounded by 



A 



E{D)<V 2 {a)= (19) 

2a dl+1 4a dl+1 (di(l-a) + l)log(l/a) 



1 + di 



1 



(l-a) s 



An explicit bound T>^{a) (though looser for large a) can 
be obtained by substituting d\ = do. The ratio of our original 
bound T>i (a), the modified bound 2?2 (a) and its looser version 
D3 (a) to the modified Gallager bound V mg (a) are depicted 
in Figure[5] As can be seen, T>i(a) is tighter than T> mg (a) for 
values of a smaller than ~ 0.71, and for larger values is looser 
but only up to a multiplicative factor of ~ 1.04. Notice again 
that all of the bounds coincide for a — » 0, as in this case they 
all tend to 1 which is the best possible general upper bound. 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

a 

Fig. 3. The ratio of the different bounds to the modified Gallager bound 

VI. Sources with Memory 

The discussion of section |lll]is easily generalized to sources 
with memory. The only point in the proof that needs to be 
reestablished is the definition of a, which was the probability 
of the most likely source letter in the memoryless case. 

Theorem 3: Consider an arithmetic coding system for a 
source with a probability distribution p(x n ) over a finite 
alphabet X. Let 



"/(d) = sup max p(x 



n+d 



If 7(d) — o(<i - ( 1+e ') for some e > 0, then the expected 
delay of the system is bounded. 

Proof: The derivations for the memoryless case can be 
repeated, with a d replaced by 7(d). The bound (1121 becomes 



,{D\x n ) < 1 + 4^ 7 (d) 

d=l 



1 + log 



7 (d) 



If the sum above converges, then we have the bounded 
expected delay property. The condition given in Theorem [5] 
is sufficient to that end. ■ 



For a memoryless source, 7(d) = a d and the condition 
is satisfied. It is also fulfilled for any source with memory 
whose conditional letter probabilities are bounded away from 
1, and thus such sources admit a bounded expected delay. 
This fact was already observed in [6] with the additional 
requirement for the conditional probabilities to be bounded 
away from as well (a byproduct of the dependency on the 
least favorable letter). The condition in Theorem[3]is however 
more general. As an example, consider a stationary ergodic 
first order Markov source. Such a source satisfies 



-I*] 



-1*1 



p(x n+w I x n ) < 1 , V^+'^i e X n +\«\ (20) 

since otherwise the source would have a deterministic cycle 
which contradicts the ergodic assumption. Define: 

A 



i 



max 
*+l*leAf" 



p(x" 



-1*1 



') 



We have from (I20t that £ < 1, and since the source is station- 
ary, £ is also independent of n. 7(d) is monotonically non- 
increasing and therefore 7(d) < £L^J and is exponentially 
decreasing with d, thus satisfying the condition in Theorem|3] 
This result can be generalized to any Markov order. 

Corollary 1: The expected delay of arithmetic coding for a 
finite alphabet, stationary ergodic Markov source of any order 
is bounded. 

VII. Summary 

New upper bounds on the expected delay of an arithmetic 
coding system for a memoryless source were derived, as a 
function of the probability a of the most likely source letter. 
In addition, a known bound due to Gallager that depends 
also on the probability of the least likely source letter was 
uniformly improved by disposing of the latter dependence. Our 
best bound was compared to the modified Gallager bound, 
and shown to be tighter for a < 0.71 and looser by a 
multiplicative factor no larger than ~ 1.04 otherwise. The 
bounding technique was generalized to sources with memory, 
providing a sufficient condition for a bounded delay. Using 
that condition, it was shown that the bounded delay property 
holds for a stationary ergodic Markov source of any order. 

Future research calls for a more precise characterization of 
the expected delay in terms of the entire probability distribu- 
tion, which might be obtained by further refining the bounding 
technique presented in this paper. In addition, a generalization 
to coding over cost channels and finite-state noiseless channels 
in the spirit of [6] can be considered as well. 
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